What are we analyzing?
We aim to create an interactive correlation plot using
plotly that allows filtering to observe how correlations
change across different groups (e.g., species in the Iris dataset).
Loads the required libraries and the Iris dataset for analysis.
library(plotly)
library(dplyr)
data(iris)
head(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
Creates a custom function to calculate regression lines and correlation coefficients for subsets of data (based on species
regression_lines <- function(data, species) {
data_sp <- data %>% filter(Species == species)
model <- lm(Petal.Length ~ Sepal.Length, data = data_sp)
x_range <- seq(min(data_sp$Sepal.Length), max(data_sp$Sepal.Length), length.out = 100)
y_range <- predict(model, newdata = data.frame(Sepal.Length = x_range))
cor_value <- round(cor(data_sp$Sepal.Length,
data_sp$Petal.Length), 2) # Calculate correlation coefficient
data.frame(Sepal.Length = x_range, Petal.Length = y_range, Species = species, Correlation = cor_value)
}
Generates regression lines and calculates correlation coefficients for each species (Setosa, Versicolor, Virginica).
lines_setosa <- regression_lines(iris, "setosa")
lines_versicolor <- regression_lines(iris, "versicolor")
lines_virginica <- regression_lines(iris, "virginica")
Creates a scatter plot of Sepal.Length vs Petal.Length and includes a filter for species.
fig <- plot_ly(
data = iris, # Data source
x = ~Sepal.Length, # X-axis: Sepal Length
y = ~Petal.Length, # Y-axis: Petal Length
color = ~Species, # Color by Species
colors = c('blue', 'orange', 'green'), # Define colors for each species
type = 'scatter', # Specify the plot type as scatter
mode = 'markers', # Display markers
transforms = list(
list(
type = 'filter', # Add a filter transform
target = ~Species, # Target the Species variable
operation = '=', # Operation type: equals
value = "setosa" # Initial filter value: setosa
)
)
)
Adds dynamic regression lines for each species to the scatter plot.
fig <- fig %>%
add_lines(
data = lines_setosa, # Data for Setosa regression line
x = ~Sepal.Length, # X values for the line
y = ~Petal.Length, # Y values for the line
line = list(color = 'blue', width = 1), # Line style
name = paste("Setosa (r =", lines_setosa$Correlation[1], ")") # Legend name with correlation
) %>%
add_lines(
data = lines_versicolor, # Data for Versicolor regression line
x = ~Sepal.Length, # X values for the line
y = ~Petal.Length, # Y values for the line
line = list(color = 'orange', width = 1), # Line style
name = paste("Versicolor (r =", lines_versicolor$Correlation[1], ")") # Legend name with correlation
) %>%
add_lines(
data = lines_virginica, # Data for Virginica regression line
x = ~Sepal.Length, # X values for the line
y = ~Petal.Length, # Y values for the line
line = list(color = 'green', width = 1), # Line style
name = paste("Virginica (r =", lines_virginica$Correlation[1], ")") # Legend name with correlation
)
Includes a dropdown menu to filter the plot by species or show all data points together.
fig <- fig %>%
layout(
title = "Dynamic Lines with Correlation Coefficients", # Plot title
xaxis = list(title = "Sepal Length (cm)"), # X-axis title
yaxis = list(title = "Petal Length (cm)"), # Y-axis title
updatemenus = list(
list(
buttons = list(
list(
method = "restyle", # Method to update the plot
args = list("transforms[0].value", "setosa"), # Filter for Setosa
label = "Iris-setosa" # Button label
),
list(
method = "restyle", # Method to update the plot
args = list("transforms[0].value", "versicolor"), # Filter for Versicolor
label = "Iris-versicolor" # Button label
),
list(
method = "restyle", # Method to update the plot
args = list("transforms[0].value", "virginica"), # Filter for Virginica
label = "Iris-virginica" # Button label
),
list(
method = "restyle", # Method to update the plot
args = list("transforms[0].value", unique(iris$Species)), # Show all species
label = "All" # Button label
)
),
direction = "down", # Dropdown direction
x = 0.1, # X position of the dropdown
y = 1.15, # Y position of the dropdown
showactive = TRUE # Show active button
)
)
)
fig
What the plot shows:
This interactive plot visualizes the relationship between
Sepal.Length and Petal.Length for the Iris
dataset, with regression lines and correlation coefficients for each
species. Key features include:
Setosa, Versicolor, or
Virginica) or view all species together.r). These lines dynamically adjust based on
the selected filter.This plot is an excellent tool for exploring both individual and combined group correlations, enabling deeper insights into the data structure.
Key Insights:
r values, provide
a clear understanding of the strength and direction of linear
relationships for each group.Recommendation:
This approach is ideal for datasets with well-defined groups (e.g.,
categories or classes). Use dynamic filtering to explore correlations
efficiently across subsets. The interactive legend and zoom features
further enhance the user experience, making it suitable for exploratory
data analysis and presentations to diverse audiences.